Acting the Same Differently: A Cross-Course Comparison 
of User Behavior in MOOCs 


Ben Gelman 
Dept. of Computer Science 
George Mason University 


bgelman@gmu.edu 
Aditya Johri 


Dept. of Computer Science 
George Mason University 


johri@gmu.edu 


ABSTRACT 

Recent studies of MOOCs demonstrate their ability to reach 
a large number of users, but also caution against the high 
rate of dropout. Some have looked closely at MOOC partic- 
ipation in order to better understand how and when users 
start to disengage, and, if they remain engaged, in what 
activities they participate. Most of this prior work relies 
heavily on descriptive statistics or clustering methodologies 
o highlight basic user participation characteristics. In this 
paper, we adapt NMF to provide a multi-dimensional view 
of user participation. We use log data to create a bottom-up 
understanding of user participation, and identify five basic 
behaviors associated with participants’ use of content and 
heir engagement with assessment. Furthermore, we do a 
cross-course analysis across four courses and find that these 
five behaviors are present in all courses. Interestingly, users’ 
participation patterns - how they engage in these five be- 
haviors - vary across courses even when the course topics 
are similar. Our methodology can be applied to other data 
sets, and findings from this work can assist in interventions 
to help users successfully accomplish their learning goals. 
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1. INTRODUCTION 


As Massive Open Online Courses (MOOCs) grow in popu- 
larity, and offer an increasing variety of subjects across mul- 
tiple platforms, there has been significant interest in MOOC 
users’ participation patterns. Extremely low user comple- 
tion rates [6] have motivated examinations and studies of 
MOOC behavior that aim to ascertain whether changes in 
pedagogy can improve completion outcomes, or if every in- 
coming class contains a cohort of users that had no intention 
to complete. 
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We were motivated by this recent work to attempt to bet- 
ter understand MOOC users’ behavioral patterns, and the 
evolution of participation over time and across courses. In 
this paper, we analyze data from four MOOC courses across 
three axes (learners, time, and courses), choosing methods 
that link behaviors and patterns across these three dimen- 
sions. Utilizing the rich features developed to characterize 
learners’ weekly interactions, we adapt non-negative matrix 
factorization (NMF) [5] to study the importance of these 
features and the behavior of users over time [2]. 


Several factors make NMF particularly well-suited for this 
type of analysis. The non-negativity constraint helps to 
identify distinct but additive latent factors. In other words, 
we are able to learn user behaviors in terms of evolving parts 
due to NMF’s additive latent factors and our temporal adap- 
tation (linking behaviors across weeks).Through this study, 
we make the following unique contributions: 1) We iden- 
tify behavioral patterns of users that are consistent across 
multiple MOOCs; 2) We demonstrate how these behaviors 
vary across different courses; and 3) We demonstrate the 
feasibility of a framework that can be applied across similar 
multi-dimensional datasets. 


2. RELATED WORK 

Several studies of MOOCs highlight low completion rates 
[13]. The University of Edinburgh launched six MOOCs on 
the Coursera platform in January 2013 [7]. Evaluations re- 
vealed that, of the 309,682 learners initially enrolled, 123,816 
(about 40%) accessed the course sites during the first week 
(‘active learners’), and 90,120 (about 29%) engaged with 
course content. Over the duration of the course, the num- 
ber of active participants rose to 165,158 (53%). As a gauge 
of persistence, 36,266 learners (nearly 12%) engaged with 
week 5 assessments. This represented 29% of initial active 
learners (although individual numbers for each of the six 
courses ranged from 7% to 59%). In addition, 34,850 people 
(roughly 11% of those who enrolled) achieved a statement of 
accomplishment for reaching a percentage-based benchmark 
of course completion. 


Similarly, when Duke University ran a Bioelectricity MOOC 
in 2012 [15], 12,175 students initially registered. Only 313 
participants (2.6%) achieved a statement of accomplish- 
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ment. Learner feedback suggested three specific reasons for 
failure to complete [15]. [8] provides a compilation of avail- 
able data on MOOC completion. Further analysis of the 
data shows that, of the 61 courses hosted by Coursera, the 
average completion rate was just over 6%. This combination 
of MOOCs’ enormous popularity and extremely low comple- 
tion rate has attracted significant interest. 


[17] used a classification method that identifies a small num- 
ber of longitudinal engagement trajectories in MOOCs. This 
classifier consistently identifies four prototypical trajectories 
of engagement: (1) Completing, (2) Auditing, (3) Disengag- 
ing, (4) Sampling. To decide these engagement patterns, 
the authors used a number of binary variables to determine 
whether a student accessed a resource or attempted a prob- 
lem. In contrast, we begin to extract a number of richer 
descriptors about the students’ interaction with the online 
learning platform. 


[9] divides participants into five profiles: no-shows (those 
who register but never log in); observers (those who log in 
but do not take assessments); drop-ins (those who partici- 
patebut do not attempt to complete the entire course); pas- 
sive (those seeing the course as content to consume); and 
active (those participating in all the activities and enriching 
the course). Similarly, [16] distinguishes five groups of peo- 
ple depending on their level of participation in the MOOC 
forum: inactive (those that do not visit the forum); pas- 
sive (those that just consume information); reacting (those 
that add further aspects to existing questions); acting (those 
that post questions and lead discussions); and supervis- 
ing/supporting (those that lead discussions and summarize 
gained insights). 


3. DATA 


Our study utilizes four courses, including 6.002x (Fall 2012 
and Spring 2013): Circuits and Electronics, 2.01x (Spring 
2013): Elements of Structures, 3.091x (Spring 2013): Intro- 
duction to Solid State Chemistry. After filtering out learners 
who had no browsing events for the duration of the courses, 
the course sizes are 17379, 6339, 5597 and 8870 users, re- 
spectively. The course durations are all set to 14 weeks. 
Using the scripts from the MOOCdb project, we are able to 
extract 21 features. Table 1 shows the feature numbers and 
descriptions. 


Figure 1 presents the course sizes dynamically. The count 
of active users for any week is given by the sum of users 
that have at least one non-zero feature in that week. The 
count of inactive users is the sum of users that have all-zero 
feature values in the current week, but had been active in a 
prior week. New users are those whose first non-zero feature 
is in the current week. The dropout value is the number of 
students who are inactive this week and will be inactive for 
all future weeks. 


Because some features are complex and not fully explained 
by their feature names, we will expand their definitions here. 
Each feature is computed using the data collected in a week, 
and generates a single value, so if there are 14 weeks in a 
course, a user’s feature vector will contain 14 values per 
feature. 
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Figure 1: Student activity statuses over time for each class. 
Vertical lines denote midterm exams and quizzes. 


Time spent: Feature 1 sums a user’s total time spent on 
any and all events in the course. Feature 11 is the single 
longest time spent on any single resource (book, wiki, lecture 
videos, etc). Feature 12 is the time specifically spent on 
lectures, and feature 13 is the time spent on the course wiki. 


Homework participation: Feature 4 is the count of all 
unique problems a learner attempted [1]. Feature 5 is the 
count of all attempts, including multiple tries at the same 
problem. Feature 6 is the count of all problems that the 
learner got correct (grade 1). Feature 7 is the average num- 
ber of attempts per problem. Feature 18 counts all correct 
attempts, in order to identify users that correctly solve the 
same problem multiple times. 


Ratio-based features: Feature 8 measures the total time 
spent on the course per correct problem by dividing features 
1 and 6. Feature 9 divides the number of attempts (feature 
5) by the number of correct problems (feature 6). Feature 
19 divides total attempts (feature 5) by non-distinct correct 
attempts (feature 18). 


Difference-based features: Features 14-17 represent the 
change in features 2, 7, 8, and 9, respectively. This is com- 
puted by taking the respective feature’s value for the current 
week, subtracting the previous week, and then normalizing 
the result. 


Regularity and procrastination: Feature 10 tells us how 
spread out a student’s schedule is over the week by present- 
ing the variance of his or her event timestamps. Feature 
20 computes the average amount of time the user submits 
before the deadline (a zero value means an on-time submis- 
sion, while a higher value means the word was submitted 
earlier). Finally, feature 21 calculates the standard devia- 
tion in working hours throughout the day—if the student 
starts work around the same time every day, the feature 
value will be low. 


Feature extraction allows us to represent learners as a set of 


multiple time series. A learner’s basic actions are collected 
and summarized into the 21 interpretive features on a weekly 
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Table 1: Students’ features. 


Features’ Names 


sum_observed_events_duration 
number_of_forum_posts 
average_length_of_forum_posts 
distinct_attempts 
number_of_attempts 
distinct_problems_correct 
average_number_of_attempts 
sum_observed_events_duration_per_correct_problem 
number_problem_attempted_per_correct_problem 
10 observed_event_timestamp_variance 
11 max_duration_resources 

12  sum_observed_events_lecture 

13 sum_observed_events_wiki 

14 difference_feature_2 

15 difference_feature_7 

16 difference_feature_8 

17 difference_feature_9 

18  attempts_correct 

19 percent_correct_submissions 

20 average_predeadline_submission_time 
21 std_hours_working 


ANawkWNnNrH 


Ne) 


basis. Because learners are represented as a set of features 
with per-week, aggregate values, time is a dimension of our 
data set. 


4. METHODOLOGY 


Uncovering the behaviors of MOOC students requires si- 
multaneously finding interaction patterns (behaviors) across 
a large number of students and permitting individual stu- 
dents to exhibit multiple behaviors. Since we assume stu- 
dent interactions may be the result of multiple behaviors, 
we choose to use a decomposition method (NMF) which re- 
sults in a parts-based representation of student interactions. 
Students may exhibit multiple behaviors and their behaviors 
may change over time. 


Step 1: Apply NMF Given a three dimensional vec- 
tor representation of the student feature data with w 
weeks, f features, and n users, we construct the tensor 
Aijx. We begin by applying non-negative matrix fac- 
torization to each feature-user matrix A; for i = [1...w]. 
We use a standard implementation [14] with NNDSVD 
[3] for initialization of the basis matrix and Frobenius 
cost function. The rank parameter, r, is set to six, 
which is selected through approximation. 


Ai = BC; (1) 


The results of factorizing A; are B; and C;, the basis 
and coefficient matrices, respectively. The dimensions 
of B; are f x r and the dimensions of C; are r x n. 


Each of the r column vectors in B; contain f values 
that essentially describe the importance of each fea- 
ture to the given column vector. In our data, we use 
the set of important features in each basis vector to 
describe a behavior. In matrix C7, there are r column 
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vectors that contain n coefficient values, one for each 
user. The m*” column vector’s coefficient values in Cc} 
describes how closely users associate with the m*” ba- 
sis vector in B;. Because every user has r coefficient 
values, it is possible for a user to identify with multiple 
basis vectors. This is significantly different than hard 
clustering approaches such as K-means, where groups 
are mutually exclusive. 


Step 2: Alignment After performing the matrix factor- 
ization on each week, we have w basis matrices and w 
coefficient matrices. To identify persistent basis vec- 
tors and patterns, we must connect the results over 
time. There is no guarantee the order of the basis 
vectors is consistent over all weeks because the ba- 
sis matrices are produced by independent executions 
of NMF. To achieve this, we first compute the cosine 
similarity using Equation (2) between two consecutive 
basis vectors. In other words, for each of the r basis 
vectors in week 7, we compute the cosine similarity to 
all basis vectors in week i + 1, resulting in r? com- 
putations. Ultimately, there are (w — 1)r? similarity 
computations.* 


U:U 
‘meno 2 
iss) = Talal @) 


By examining the distribution of cosine similarity val- 
ues, an alignment threshold may be selected. For our 
data, a threshold value of 0.95 was chosen to identify 
matching basis vectors between weeks. We found that 
after the first week, all basis vectors uniquely match 
one and only one basis vector in the consecutive week 
when a threshold of > 0.95 is used. This phenomenon 
occurred for all four courses we used in our experi- 
ments. Although basis matrices for each week are esti- 
mated independently, we find five basis vectors which 
persist over time and occur in all the classes. 


Step 3: Normalize and define behaviors The aligned, 
per-week basis vectors are normalized. We then av- 
erage these aligned-normalized vectors into a single, 
representative behavioral vector. Having a single, nor- 
malized vector permits a semantic interpretation of the 
behavior based on relative feature values. By identi- 
fying the most important features (the ones with the 
largest values) in each behavioral vector, we are able to 
label the vectors by the interaction pattern they best 
represent. 


Step 4: Coefficient analysis 


Every student’s interaction attributes may be approx- 
imated using a weighted mixture of the discovered be- 
havior vectors. These weights (coefficients) can be con- 
sidered to define a soft-membership of a student to a 
behavior. 


In order to decide if a user belongs to a behavior, we 
threshold the distribution of the coefficient values per 


'We choose cosine similarity because it is a measure of an- 
gular similarity between two vectors. Thus, two basis vec- 
tors whose only nonzero entry is feature j will be extremely 
similar. This is valuable for aligning basis vectors whose 
distributions of features are similar. 
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week and per behavioral vector (or basis). This means 
that the algorithm will generate r x w thresholds. The 
thresholding algorithm takes the entire range of coef- 
ficient values per vector and limits the range of values 
to the top «%. The threshold (top 7%) is a parame- 
ter. This means that if the range of coefficient values 
for a behavior is 0-100, then selecting a threshold of 
0.85 will only consider users with coefficient values of 
85-100 to be exhibiting that behavior. There is an 
additional minimum size parameter s that adjusts for 
a skewed distribution where a few users have signifi- 
cantly higher coefficient values that any other users. 
This skewed distribution causes the top 7% of coef- 
ficient values to only include these few users. If the 
number of users within the top x% is less than the s, 
then the users will be saved, and the threshold compu- 
tation will be repeated without them. For our data, we 
use a threshold of 0.85 with a minimum size parameter 
of 30. 


We assign behaviors to students for each week using 
the data-derived thresholds. By tracking the set of be- 
haviors across weeks, we generate a transition diagram 
that presents the number of students exhibiting each 
behavior over each week and the migration of users 
between various behaviors. The transition diagram al- 
lows us to understand the evolution of user behavior 
as a course progresses. 


5. BASIS MATRIX RESULTS 

The resulting basis matrices for 6.002x (Fall 2012) exhibit 
eight unique behaviors. Tables 2 and 3 numerically sum- 
marize behaviors for week one and the average of the other 
weeks, respectively. Because the first week manifests two 
unique behaviors, namely introduction and sampling, it is 
kept separate. From the second week onwards, all behaviors 
are persistent (at least 95% cosine similarity). This allows 
us to average weeks two through 14 in Table 3. 


Basis vector one is dominated by feature 11 
(max_duration_resources), which is the duration of the 
longest observed event this week. This vector represents a 
deep behavior, because the associated students must have 
spent a long time on a single resource. 


Basis vector two is primarily decided by feature 10 (ob- 
served_event_timestamp_variance). | Because this feature 
ells us how spread out the student’s schedule is over the 
week, this vector describes a consistent behavior. Having 
a high timestamp variance requires users to log in multiple 
imes a week. 


Basis vector three is primarily decided by feature 21 
std_hours_working), which is the standard deviation in 
working hours over the day. This could represent a bursty 
behavior—because a user must be active during different 
imes in a day to obtain a high feature value, this could 
mean that the user has a single prolonged session or multi- 
ple, separate sessions. 


Two basis vectors exist only in the first week of the 
course. Basis vector four in Table 2 is decided by feature 
three (average_length_of_form_posts) and feature two (num- 
ber_of_form_posts). This supports the idea that users inter- 
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Table 2: Matrix of normalized basis vectors (behaviors) for 


week 1 (course 6.002x fall 2012). The behaviors Introduction 
and Sampling are unique to week 1. Dominant feature values 
are shown in boldface. 

Feature | Deep Consistent Bursty Introduction Sampling 
1 0.012 0.000 0.001 0.000 0.088 

2 0.000 0.000 0.000 0.137 0.000 

3 0.000 0.000 0.000 0.862 0.000 

4 0.000 0.000 0.000 0.000 0.000 

5 0.000 0.000 0.000 0.000 0.000 

6 0.000 0.000 0.000 0.000 0.000 

7 0.000 0.000 0.000 0.000 0.000 

8 0.000 0.000 0.000 0.000 0.000 

9 0.000 0.000 0.000 0.000 0.000 

10 0.000 0.988 0.000 0.000 0.000 

11 0.981 0.011 0.000 0.001 0.000 

12 0.000 0.000 0.000 0.000 0.665 

13 0.000 0.000 0.000 0.000 0.000 

14 0.000 0.000 0.000 0.000 0.000 

15 0.000 0.000 0.000 0.000 0.000 

16 0.000 0.000 0.000 0.000 0.000 

17 0.000 0.000 0.000 0.000 0.000 

18 0.000 0.000 0.000 0.000 0.000 

19 0.000 0.000 0.000 0.000 0.000 

20 0.000 0.000 0.000 0.000 0.000 

21 0.008 0.000 0.999 0.000 0.248 


acted heavily during the opening week of the course. The 
disappearance of this basis vector, however, tells us that fo- 
rum interaction in later parts of the course was insignificant 
in 6.002x fall 2012. For this reason, this basis vector char- 
acterizes an introduction behavior. 


Basis vector five in Table 2 is defined by features 12 
sum_observed_events_lecture), 21 (std_hours_working), and 
1 (sum_observed_events_duration). This group of features 
supports the hypothesis that users are browsing through a 
lot of content during the first week of the course. This may 
be because users are interested in seeing what lies ahead in 
he course, or because some users may have joined only to 
gather information on one particular topic. Thus, basis vec- 
tor five during the first week expresses a probing behavior. 


After the first week, two more basis vectors persist. At 
his point, basis vector four is primarily characterized by 
feature 19 (percent_correct_submissions). By turning in 
assignments with high correctness, the corresponding stu- 
dents can be associated with a performance behavior. Ba- 
sis vector five is strongly defined by feature 20 (aver- 
age_predeadline_submission_time). By turning in assign- 
ments long before their deadlines, these students can be as- 
sociated with an response behavior. 


When we apply the same analysis to other courses, we see 
similar behaviors. The average basis matrix tables for 2.01x, 
3.091x, and 6.002x are not displayed because they exhibit 
the same behaviors as table 3 with 95% cosine similarity. 
It appears that each of these five behaviors— deep, consis- 
tent, bursty, performance, and response—appear in all of 
the courses. The key difference is that 6.002x has two ad- 
ditional behaviors that occur only in the first week. The 
introduction and sampling behaviors do not appear to be 
prevalent in the other courses. This could be due to course 
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Table 3: Average matrix of normalized basis vectors for 
weeks 2 through 14 (Course 6.002x, Fall 2012). Dominant 
feature values are shown in boldface. 


Feature | Deep Consistent Bursty Performance Response 
1 0.031 0.002 0.007 0.000 0.000 
2 0.001 0.000 0.001 0.000 0.000 
3 0.004 0.001 0.003 0.000 0.000 
4 0.005 0.000 0.000 0.000 0.029 
5 0.003 0.000 0.000 0.001 0.012 
6 0.000 0.000 0.000 0.052 0.000 
7 0.00 0.000 0.000 0.003 0.003 
8 0.000 0.000 0.000 0.001 0.001 
9 0.000 0.000 0.000 0.001 0.001 
10 0.00 0.993 0.000 0.000 0.000 
11 0.922 0.000 0.005 0.007 0.028 
12 0.010 0.000 0.002 0.000 0.000 
13 0.000 0.000 0.000 0.000 0.000 
14 0.00 0.000 0.000 0.000 0.000 
15 0.00 0.001 0.000 0.002 0.002 
16 0.000 0.000 0.000 0.000 0.000 
7 0.000 0.000 0.000 0.000 0.000 
18 0.000 0.000 0.000 0.015 0.000 
19 0.002 0.000 0.000 0.743 0.004 
20 0.000 0.000 0.000 0.174 0.920 
21 0.017 0000 0.980 0.000 0.000 


sizes, and the fact that 6.002x was the first edX course ever 
released. Users may have been encouraged to communicate 
in the forums early on (introduction), or there may have 
been users testing the waters of this new online course plat- 
form (sampling). 


6. STUDENT TRANSITIONS 


After applying the thresholding algorithm, we generate user 
behavior transition diagrams for each course. The size of 
each colored bar is scaled according to the amount of stu- 
dents exhibiting the behavior. The transition lines in be- 
tween the bars are sized and directed based on user migra- 
tion between sets of behaviors. 


Using these diagrams, we can observe changes in the behav- 
iors themselves, and the transitional motifs that occur due to 
user migration. After the first week or two, a single behavior 
persists as the largest. Additionally, this behavior tends to 
act as a hub for user migration. This phenomenon signifi- 
cantly highlights the fact that the behaviors may manifest 
differently despite the existence of the same five behaviors 
among all five courses. 


In 2.01x, most user migration occurs into and out of the re- 
sponse behavior, with a secondary focus on the deep behav- 
ior. Notable moments occur in week 5 and weeks 10 to 12, 
where migration between consistent and deep occur. Oth- 
erwise, there are several recurrent transitions. These motifs 
include each permutation of deep and/or response migrating 
to deep and/or response. 


In 3.091x, most user migration occurs into and out of the 
performance behavior. Most unusually, there is very little 
migration in the entire first half of the course. Only in the 
second half does migration pick up to levels we would have 
expected given the results of the other courses. Although 
some migration patterns through the performance behavior 
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Figure 2: User behavior transitions over time. Vertical bars 
are numbers of students performing each behavior. Diago- 
nal groups indicate transitions: for example, the transition 
» indicates students who were Deep and Bursty and have 
transitioned to Consistent. Transition thickness is the log of 
the number of students involved. 
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repeat occasionally, they only occur for two to three weeks 
at atime. Thus, we do not infer any transitional motifs from 
this course. 


In 6.002x fall, most user migration occurs through the deep 
behavior, with a secondary focus on the consistent behav- 
ior. A unique circumstance occurs between weeks one and 
two with the migration of the initially enormous bursty be- 
havior. Besides this, the transitional motifs include each 
permutation of deep and/or consistent migrating to deep 
and/or consistent. 


In 6.002x spring, most user migration occurs through the 
performance behavior. Unlike the other courses, there are 
two more behaviors through which there is significant mi- 
gration: the deep and bursty behaviors. As a result, we 
see many more motifs than simply the permutations of the 
top two behaviors. In the early weeks, migration is heaviest 
through deep and performance. This means that early on, 
users are both engaged and performing well. In the mid- 
dle weeks, during and after the midterm, there is a chaotic 
shuffle between behaviors as users deal with the course dif- 
ferently. In the later weeks, however, deep migration falls 
off and users mostly move between bursty and performance. 
This may suggest that users are capable of finishing their 
work in a single day or two and achieving high correctness 
simultaneously. This result could perhaps reflect a decreased 
difficulty in the later weeks of the course. The occurrence of 
multiple large behaviors appears to tells us more about the 
evolution of user behavior. 


7. CONCLUSION 


In this comparative study of four MOOC courses, we show 
how users follow five specific behaviors across the courses. 
We found that although these behaviors are common, their 
patterns of occurrence vary across courses. Through our 
multi-dimensional data and our adaptation of NMF, the re- 
sults reveal in great detail the differences in behavior over 
time between the courses. Because our method analyzes 
behavior at every step of the MOOC experience, our work 
can improve the learning experience for all users, not just 
those that plan to finish the course. For future work, we can 
expand the purposes of user behavior trajectories by using 
Markov modeling for prediction. We can add newer, more 
descriptive features in addition to running the analysis with 
a higher rank in order to discover possible alternative be- 
haviors. If course outcomes and assessment information are 
available, we can combine these with the dynamic behav- 
ioral motifs to better understand the underlying processes 
that fuel behavioral changes. 
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